{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Music segmentation\n",
    "\n",
    "<!-- {{ add_binder_block(page) }} -->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "Music segmentation can be seen as a change point detection task and therefore can be carried out with `ruptures`.\n",
    "Roughly, it consists in finding the temporal boundaries of meaningful sections, e.g. the intro, verse, chorus and outro in a song.\n",
    "This is an important task in the field of music information retrieval.\n",
    "\n",
    "The adopted approach is summarized as follows:\n",
    "\n",
    "- the original sound is transformed into an informative (multivariate) representation;\n",
    "- mean shifts are detected in this new representation using a dynamic programming approach.\n",
    "\n",
    "In this example, we use the well-known tempogram representation, which is based on the onset strength envelope of the input signal, and captures tempo information [[Grosche2010]](#Grosche2010).\n",
    "\n",
    "To load and manipulate sound data, we use the [librosa package](https://librosa.org/doc/latest/index.html) [[McFee2015]](#McFee2015)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup\n",
    "\n",
    "First, we make the necessary imports."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import librosa\n",
    "import numpy as np\n",
    "from IPython.display import Audio, display\n",
    "\n",
    "import ruptures as rpt  # our package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also define a utility function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fig_ax(figsize=(15, 5), dpi=150):\n",
    "    \"\"\"Return a (matplotlib) figure and ax objects with given size.\"\"\"\n",
    "    return plt.subplots(figsize=figsize, dpi=dpi)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the data\n",
    "\n",
    "A number of music files are available in [Librosa](https://librosa.org/doc/latest/index.html).\n",
    "See [here](https://librosa.org/doc/latest/recordings.html#description-of-examples) for a complete list.\n",
    "In this example, we choose the *Dance of the Sugar Plum Fairy* from *The Nutcracker* by Tchaikovsky.\n",
    "\n",
    "We can listen to the music as well as display the sound envelope."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "duration = 30  # in seconds\n",
    "signal, sampling_rate = librosa.load(librosa.ex(\"nutcracker\"), duration=duration)\n",
    "\n",
    "# listen to the music\n",
    "display(Audio(data=signal, rate=sampling_rate))\n",
    "\n",
    "# look at the envelope\n",
    "fig, ax = fig_ax()\n",
    "ax.plot(np.arange(signal.size) / sampling_rate, signal)\n",
    "ax.set_xlim(0, signal.size / sampling_rate)\n",
    "ax.set_xlabel(\"Time (s)\")\n",
    "_ = ax.set(title=\"Sound envelope\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Signal segmentation\n",
    "\n",
    "### Transform the signal into a tempogram\n",
    "\n",
    "The tempogram measures the tempo (measured in Beats Per Minute, BPM) profile along the time axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute the onset strength\n",
    "hop_length_tempo = 256\n",
    "oenv = librosa.onset.onset_strength(\n",
    "    y=signal, sr=sampling_rate, hop_length=hop_length_tempo\n",
    ")\n",
    "# Compute the tempogram\n",
    "tempogram = librosa.feature.tempogram(\n",
    "    onset_envelope=oenv,\n",
    "    sr=sampling_rate,\n",
    "    hop_length=hop_length_tempo,\n",
    ")\n",
    "# Display the tempogram\n",
    "fig, ax = fig_ax()\n",
    "_ = librosa.display.specshow(\n",
    "    tempogram,\n",
    "    ax=ax,\n",
    "    hop_length=hop_length_tempo,\n",
    "    sr=sampling_rate,\n",
    "    x_axis=\"s\",\n",
    "    y_axis=\"tempo\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Detection algorithm\n",
    "\n",
    "We choose to detect changes in the mean of the tempogram, which is a multivariate signal.\n",
    "This amounts to selecting the $L_2$ cost function (see [CostL2](../../user-guide/costs/costl2)).\n",
    "To that end, two methods are available in `ruptures`:\n",
    "\n",
    "- `rpt.Dynp(model=\"l2\")`\n",
    "- `rpt.KernelCPD(kernel=\"linear\")`\n",
    "\n",
    "Both will return the same results but the latter is implemented in C and therefore significatively faster.\n",
    "\n",
    "### Number of changes\n",
    "In order to choose the number of change points, we use the elbow method.\n",
    "In the change point detection setting, this heuritic consists in:\n",
    "\n",
    "- plotting the sum of costs for 1, 2,...,$K_{\\text{max}}$ change points,\n",
    "- picking the number of changes at the \"elbow\" of the curve.\n",
    "\n",
    "Intuitively, adding change points beyond the \"elbow\" only provides a marginal decrease of the sum of costs.\n",
    "\n",
    "Here, we set $K_{\\text{max}}$:=20.\n",
    "\n",
    "!!! note\n",
    "    In `rpt.Dynp` and `rpt.KernelCPD`, whenever a segmentation with $K$ changes is computed, all segmentations with 1,2,..., $K-1$ are also computed and stored.\n",
    "    Indeed, thanks to the dynamic programming approach, segmentations with less changes are avalaible for free as intermediate calculations.\n",
    "    Therefore, users who need to compute segmentations with several numbers of changes should start with the one with the most changes.\n",
    "\n",
    "In addition, note that, in `ruptures`, the sum of costs of a segmentation defined by a set of change points `bkps` can easily be computed using:\n",
    "\n",
    "```python\n",
    "algo = rpt.KernelCPD(kernel=\"linear\").fit(signal)\n",
    "algo.cost.sum_of_costs(bkps)\n",
    "```\n",
    "(Replace `rpt.KernelCPD` by the algorithm you are actually using, if different.)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Choose detection method\n",
    "algo = rpt.KernelCPD(kernel=\"linear\").fit(tempogram.T)\n",
    "\n",
    "# Choose the number of changes (elbow heuristic)\n",
    "n_bkps_max = 20  # K_max\n",
    "# Start by computing the segmentation with most changes.\n",
    "# After start, all segmentations with 1, 2,..., K_max-1 changes are also available for free.\n",
    "_ = algo.predict(n_bkps_max)\n",
    "\n",
    "array_of_n_bkps = np.arange(1, n_bkps_max + 1)\n",
    "\n",
    "\n",
    "def get_sum_of_cost(algo, n_bkps) -> float:\n",
    "    \"\"\"Return the sum of costs for the change points `bkps`\"\"\"\n",
    "    bkps = algo.predict(n_bkps=n_bkps)\n",
    "    return algo.cost.sum_of_costs(bkps)\n",
    "\n",
    "\n",
    "fig, ax = fig_ax((7, 4))\n",
    "ax.plot(\n",
    "    array_of_n_bkps,\n",
    "    [get_sum_of_cost(algo=algo, n_bkps=n_bkps) for n_bkps in array_of_n_bkps],\n",
    "    \"-*\",\n",
    "    alpha=0.5,\n",
    ")\n",
    "ax.set_xticks(array_of_n_bkps)\n",
    "ax.set_xlabel(\"Number of change points\")\n",
    "ax.set_title(\"Sum of costs\")\n",
    "ax.grid(axis=\"x\")\n",
    "ax.set_xlim(0, n_bkps_max + 1)\n",
    "\n",
    "# Visually we choose n_bkps=5 (highlighted in red on the elbow plot)\n",
    "n_bkps = 5\n",
    "_ = ax.scatter([5], [get_sum_of_cost(algo=algo, n_bkps=5)], color=\"r\", s=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Visually, we choose 5 change points** (highlighted in red on the elbow plot).\n",
    "\n",
    "\n",
    "### Results\n",
    "\n",
    "The tempogram can now be segmented into homogeous (from a tempo standpoint) portions.\n",
    "The results are show in the following figure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Segmentation\n",
    "bkps = algo.predict(n_bkps=n_bkps)\n",
    "# Convert the estimated change points (frame counts) to actual timestamps\n",
    "bkps_times = librosa.frames_to_time(bkps, sr=sampling_rate, hop_length=hop_length_tempo)\n",
    "\n",
    "# Displaying results\n",
    "fig, ax = fig_ax()\n",
    "_ = librosa.display.specshow(\n",
    "    tempogram,\n",
    "    ax=ax,\n",
    "    x_axis=\"s\",\n",
    "    y_axis=\"tempo\",\n",
    "    hop_length=hop_length_tempo,\n",
    "    sr=sampling_rate,\n",
    ")\n",
    "\n",
    "for b in bkps_times[:-1]:\n",
    "    ax.axvline(b, ls=\"--\", color=\"white\", lw=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Visually, the estimated change points indeed separate portions of signal with a relatively constant tempo profile.\n",
    "Going back to the original music signal, this intuition can be verified by listening to the individual segments defined by the changes points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute change points corresponding indexes in original signal\n",
    "bkps_time_indexes = (sampling_rate * bkps_times).astype(int).tolist()\n",
    "\n",
    "for segment_number, (start, end) in enumerate(\n",
    "    rpt.utils.pairwise([0] + bkps_time_indexes), start=1\n",
    "):\n",
    "    segment = signal[start:end]\n",
    "    print(f\"Segment n°{segment_number} (duration: {segment.size/sampling_rate:.2f} s)\")\n",
    "    display(Audio(data=segment, rate=sampling_rate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first segment corresponds to the soundless part of the signal (visible on the plot of the signal enveloppe).\n",
    "The following segments correspond to different rythmic portions and the associated change points occur when various instruments enter or exit the play."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This example shows how to apply `ruptures` on a music segmentation task.\n",
    "More precisely, we detected mean shifts on a well-suited representation (the tempogram) of a music signal.\n",
    "The number of changes was heuristically determined (with the \"elbow\" method) and the results agreed with visually and auditory intuition.\n",
    "\n",
    "Such results can then be used to characterize the structure of music and songs, for music classification, recommandation, instrument recognition, etc.\n",
    "This procedure could also be enriched with other musically relevant representations (e.g. the chromagram) to detect other types of changes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Authors\n",
    "\n",
    "This example notebook has been authored by Olivier Boulant and edited by Charles Truong. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "<a id=\"Grosche2010\">[Grosche2010]</a>\n",
    "Grosche, P., Müller, M., & Kurth, F. (2010). Cyclic tempogram - a mid-level tempo representation for music signals. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5522–5525.\n",
    "\n",
    "<a id=\"McFee2015\">[McFee2015]</a>\n",
    "McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., & Nieto, O. (2015). Librosa: audio and music signal analysis in Python. Proceedings of the Python in Science Conference, 8, 18–25.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "toc-autonumbering": false,
  "toc-showmarkdowntxt": false
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
